Picture for Benfeng Xu

Benfeng Xu

WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora

Add code
Feb 03, 2026
Viaarxiv icon

A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces

Add code
Feb 03, 2026
Viaarxiv icon

Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles

Add code
Feb 03, 2026
Viaarxiv icon

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

Add code
Feb 02, 2026
Viaarxiv icon

DeepResearch Bench II: Diagnosing Deep Research Agents via Rubrics from Expert Report

Add code
Jan 13, 2026
Viaarxiv icon

SearchAttack: Red-Teaming LLMs against Real-World Threats via Framing Unsafe Web Information-Seeking Tasks

Add code
Jan 07, 2026
Viaarxiv icon

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Add code
Jun 13, 2025
Viaarxiv icon

From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding

Add code
Jun 04, 2025
Figure 1 for From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
Figure 2 for From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
Figure 3 for From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
Figure 4 for From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
Viaarxiv icon

Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability

Add code
May 30, 2025
Figure 1 for Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability
Figure 2 for Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability
Figure 3 for Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability
Figure 4 for Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability
Viaarxiv icon

MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning

Add code
May 27, 2025
Viaarxiv icon